{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Inferencing MNIST ONNX Model using AI-Serving"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The MNIST handwritten digit classification problem is a classic dataset used in computer vision and deep learning, and Convolutional Neural Network (CNN) is the current state-of-art architecture for image classification task.\n",
    "\n",
    "In this tutorial, we will use the Open Neural Network eXchange (ONNX) format to show how to deploy a pre-trained MNIST CNN model using AI-Serving."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Prerequisites to run the notebook\n",
    "\n",
    "1). Download the pre-trained MNIST ONNX model file (mnist.tar.gz) from [here](https://onnxzoo.blob.core.windows.net/models/opset_8/mnist/mnist.tar.gz). For example, you could use `wget` and `tar` to download and uncompress the tar file.  Please, refer to [MNIST - Handwritten Digit Recognition](https://github.com/onnx/models/tree/master/vision/classification/mnist) about the MNIST CNN model. \n",
    "\n",
    "```bash\n",
    "wget https://onnxzoo.blob.core.windows.net/models/opset_8/mnist/mnist.tar.gz\n",
    "tar xvzf mnist.tar.gz\n",
    "```\n",
    "\n",
    "2). Pull the latest docker image of AI-Serving with ONNX that leverages the CPU version of [ONNX Runtime](https://github.com/microsoft/onnxruntime). Please, refer to [Docker Containers for AI-Serving](https://github.com/autodeployai/ai-serving/tree/master/dockerfiles) about more docker images.\n",
    "\n",
    "```bash\n",
    "docker pull autodeployai/ai-serving\n",
    "```\n",
    "\n",
    "3). Start a docker container of AI-Serving. The port `9090` is the port of HTTP endpoint while `9091` is for gRPC, you could see an error likes `Bind for 0.0.0.0:9090 failed: port is already allocated`, please use another new port instead of the first part as follows `-p $(NEW_PORT):9090` to run the command again, and remember the port is always needed in the URL of HTTP endpoint. \n",
    "\n",
    "```bash\n",
    "docker run --rm -it -v $(PWD):/opt/ai-serving -p 9090:9090 -p 9091:9091 autodeployai/ai-serving\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Additional information about two python files\n",
    "In the current directory, there are two python files `onnx_ml_pb2.py` and `ai_serving_pb2.py`, which are generated from compiling the [two proto files](https://github.com/autodeployai/ai-serving/tree/master/src/main/protobuf) using [protoc](https://developers.google.com/protocol-buffers/docs/pythontutorial), for example, the command as follows:\n",
    "\n",
    "```bash\n",
    "protoc -I=$SRC_DIR --python_out=. ai-serving.proto onnx-ml.proto\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Install dependencies\n",
    "We will install python libraries for data manipulation, image manipulation and display:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install numpy\n",
    "!pip install matplotlib"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Import dependent libraries\n",
    "Import some dependent libraries that we are going to need to run the MNIST ONNX model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from os import path\n",
    "import numpy as np\n",
    "import requests\n",
    "from matplotlib import pyplot as plt\n",
    "from pprint import pprint\n",
    "\n",
    "import onnx_ml_pb2\n",
    "import ai_serving_pb2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define the base HTTP URL\n",
    "Change the port number `9090` to the appropriate port number if you had changed it during AI-Serving docker instantiation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "port = 9090\n",
    "base_url = 'http://localhost:' + str(port)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Test the server availability\n",
    "Use the specific endpoint `http://host:port/up` to test whether the server has been initialized and is ready to accept requests. The `OK` message indicates it's already available."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test_url = base_url + '/up'\n",
    "response = requests.get(test_url)\n",
    "print('The status of the server: ', response.text)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Deploy the ONNX model into AI-Serving\n",
    "First, we need to deploy the MNIST ONNX model `mnist/model.onnx` into AI-Serving, which can serve multiple models or multiple versions for a named model at once.\n",
    "\n",
    "You must specify a correct content type for ONNX models when constructing an HTTP request to deploy an ONNX model, the candidates are:\n",
    " * application/octet-stream\n",
    " * application/vnd.google.protobuf\n",
    " * application/x-protobuf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The specified servable name\n",
    "model_name = 'mnist'\n",
    "deployment_url = base_url + '/v1/models/' + model_name\n",
    "\n",
    "# The specified content type for the model:\n",
    "headers = {'Content-Type': 'application/x-protobuf'}\n",
    "\n",
    "model_path = path.join('mnist', 'model.onnx')\n",
    "with open(model_path, 'rb') as file:\n",
    "    deployment_response = requests.put(deployment_url, headers=headers, data=file)\n",
    "\n",
    "# The response is a JSON object contains the sepcified servable name and the model version deployed\n",
    "deployment_response_info = deployment_response.json()\n",
    "print('The depoyment response: ', deployment_response_info)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Retrieve metadata of the deployed model\n",
    "The metadata will contain model inputs and outputs, which are needed when constructing an input request and consume an output response."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model_version = deployment_response_info['version']\n",
    "metadata_url = base_url + '/v1/models/' + model_name + '/versions/' + str(model_version)\n",
    "metadata_response = requests.get(metadata_url)\n",
    "\n",
    "print('The model metadata info:')\n",
    "pprint(metadata_response.json())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load the input images\n",
    "We will use the sample test data in the compressed file `mnist.tar.gz`, there are 3 test cases that include both input and output data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "input_tensors = []\n",
    "input_arrays = []\n",
    "output_tensors = []\n",
    "output_arrays = []\n",
    "output_digits = []\n",
    "\n",
    "def postprocess(result):\n",
    "    \"\"\"postprocess the predicted results\"\"\"\n",
    "    return int(np.argmax(np.array(result).squeeze(), axis=0))\n",
    "\n",
    "# Read the three test data sets and show them.\n",
    "fig = plt.figure()\n",
    "model_dir = 'mnist'\n",
    "for i in range(3):\n",
    "    input_test_data_set = path.join(model_dir, 'test_data_set_{0}'.format(i), 'input_0.pb')\n",
    "    output_test_data_set = path.join(model_dir, 'test_data_set_{0}'.format(i), 'output_0.pb')\n",
    "    \n",
    "    # Read the input data\n",
    "    input_tensor = onnx_ml_pb2.TensorProto()\n",
    "    with open(input_test_data_set, 'rb') as f:\n",
    "        input_tensor.ParseFromString(f.read())\n",
    "    input_tensors.append(input_tensor)\n",
    "    input_tensor_array = np.frombuffer(input_tensor.raw_data, dtype=np.float32).astype('float32')\n",
    "    input_arrays.append(input_tensor_array)\n",
    "    \n",
    "    # Read the output data\n",
    "    output_tensor = onnx_ml_pb2.TensorProto()\n",
    "    with open(output_test_data_set, 'rb') as f:\n",
    "        output_tensor.ParseFromString(f.read())\n",
    "    output_tensors.append(output_tensor)\n",
    "    output_tensor_array = np.frombuffer(output_tensor.raw_data, dtype=np.float32).astype('float32')\n",
    "    output_arrays.append(output_tensor_array)\n",
    "    output_digit = postprocess(output_tensor_array)\n",
    "    output_digits.append(output_digit)\n",
    "    \n",
    "    # Add a subplot for the current digit.\n",
    "    plt.subplot(1, 3, i+1)\n",
    "    plt.tight_layout()\n",
    "    plt.imshow(input_tensor_array.reshape([28, 28]), cmap='gray', interpolation='none')\n",
    "    plt.title(\"Digit: {}\".format(output_digit))\n",
    "    plt.xticks([])\n",
    "    plt.yticks([])\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## HTTP request formats for the AI-Serving\n",
    "The request for AI-Serving could have two formats: JSON and binary, the HTTP header Content-Type tells the server which format to handle and thus it is required for all requests. The binary payload has better latency, especially for the big tensor value for ONNX models, while the JSON format is easy for human readability.\n",
    "\n",
    "- Content-Type: application/octet-stream, application/vnd.google.protobuf or application/x-protobuf. The request body must be the protobuf message PredictRequest, besides of those common scalar values, it can use the standard TensorProto value directly.\n",
    "\n",
    "\n",
    "- Content-Type: application/json. The request body must be a JSON object formatted as described [here](https://github.com/autodeployai/ai-serving#4-predict-api)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Construct binary requests for the AI-Serving\n",
    "We will create both instances of PredictRequest, one is using the `Records` format that has one case, the other is using the `Split` format that contains two cases."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from ai_serving_pb2 import RecordSpec, Record, PredictRequest, ListValue, Value\n",
    "\n",
    "# Create an instance of RecordSpec using `records` that contains only the first tensor.\n",
    "request_message_records = PredictRequest(X=RecordSpec(\n",
    "    records=[Record(fields={'Input3': Value(tensor_value=input_tensors[0])})]\n",
    "))\n",
    "\n",
    "# Create an instance of RecordSpec using `split` that contains the last two tensors.\n",
    "request_message_split = PredictRequest(X=RecordSpec(\n",
    "    columns = ['Input3'],\n",
    "    data = [\n",
    "        ListValue(values=[Value(tensor_value=input_tensors[1])]),\n",
    "        ListValue(values=[Value(tensor_value=input_tensors[2])])\n",
    "    ]\n",
    "))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Make the HTTP requests with binary data to the AI-Serving\n",
    "Make predictions using the AI-Serving, the content type of requests with binary data must be one of those three candidates above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "headers = {'Content-Type': 'application/x-protobuf'}\n",
    "\n",
    "# When version is omitted, the latest version is used.\n",
    "prediction_url = base_url + '/v1/models/' + model_name\n",
    "\n",
    "# Make prediction for the `records` request message.\n",
    "prediction_response_records = requests.post(prediction_url, \n",
    "                                           headers=headers, \n",
    "                                           data=request_message_records.SerializeToString())\n",
    "\n",
    "# Make prediciton for the `split` request message.\n",
    "prediction_response_split = requests.post(prediction_url, \n",
    "                                           headers=headers, \n",
    "                                           data=request_message_split.SerializeToString())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Consume the HTTP response with binary data from the AI-serving\n",
    "Having received the results from the server, we are going to parse the \"serialized\" message that we just received for us to make sense of the results. And compare the actual results to the desired ones. \n",
    "\n",
    "**NOTE: The data format of the output response is always the same as the input request.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def print_output_and_compare_result(index, output_tensor):\n",
    "    # Print the actual result for the tensor\n",
    "    actual_output_tensor_array = np.asarray(output_tensor.float_data, dtype=np.dtype('float32'))\n",
    "    print('Actual output shape of test data set {}: '.format(index), output_tensor.dims)\n",
    "    print('Actual output values of test data set {}: '.format(index), actual_output_tensor_array)\n",
    "    print('Actual final recognized digit of test data set {}: '.format(index), postprocess(actual_output_tensor_array))\n",
    "    \n",
    "    # Both results are expected be equal to each other.\n",
    "    np.testing.assert_almost_equal(actual_output_tensor_array, output_arrays[index], 1)\n",
    "    np.testing.assert_equal(postprocess(actual_output_tensor_array), output_digits[index])\n",
    "\n",
    "\n",
    "# Parse the response message from the `recrods` request.\n",
    "response_message = ai_serving_pb2.PredictResponse()\n",
    "response_message.ParseFromString(prediction_response_records.content)\n",
    "\n",
    "# Print and compare the result for the test data set 0\n",
    "print_output_and_compare_result(0, response_message.result.records[0].fields['Plus214_Output_0'].tensor_value)\n",
    "print('\\n----------------------------------------------------------------------------\\n')\n",
    "\n",
    "# Parse the response message from the `split` requesgt.\n",
    "response_message = ai_serving_pb2.PredictResponse()\n",
    "response_message.ParseFromString(prediction_response_split.content)\n",
    "\n",
    "print('Acutal output columns: ', response_message.result.columns)\n",
    "print('\\n----------------------------------------------------------------------------\\n')\n",
    "\n",
    "# Print and compare the result for the test data set 1\n",
    "print_output_and_compare_result(1, response_message.result.data[0].values[0].tensor_value)\n",
    "\n",
    "print('\\n----------------------------------------------------------------------------\\n')\n",
    "\n",
    "# Print and compare the result for the test data set 2\n",
    "print_output_and_compare_result(2, response_message.result.data[1].values[0].tensor_value)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Construct JSON requests for the AI-Serving\n",
    "Create both JSON objects, one is using the `Records` format that has one case, the other is using `Split` that contains two cases."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a JSON object with records that contains only the first tensor.\n",
    "request_json_recoreds = {\n",
    "    'X': [{\n",
    "        'Input3': input_arrays[0].tolist()\n",
    "    }]\n",
    "}\n",
    "\n",
    "# Create a JSON object with columns and data that contains the last two tensors.\n",
    "request_json_split = {\n",
    "    'X': {\n",
    "        'columns': ['Input3'],\n",
    "        'data': [[input_arrays[1].tolist()], [input_arrays[2].tolist()]]\n",
    "    }\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Make the HTTP requests with JSON data to the AI-Serving\n",
    "Make predictions using the AI-Serving, the content type of requests with JSON data must be `application/json`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# When version is omitted, the latest version is used.\n",
    "prediction_url = base_url + '/v1/models/' + model_name\n",
    "\n",
    "# The Content-Type: application/json is specified implicitly when using json instead of data\n",
    "prediction_json_response_records = requests.post(prediction_url, json=request_json_recoreds)\n",
    "prediction_json_response_split = requests.post(prediction_url, json=request_json_split)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Consume the HTTP response with JSON data from the AI-serving\n",
    "Having received the results from the server, we are going to parse the JSON text that we just received for us to make sense of the results. And compare the actual results to the desired ones.\n",
    "\n",
    "**NOTE: The data format of the output response is always the same as the input request.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def print_json_output_and_compare_result(index, output_list):\n",
    "    # Print the actual result for the tensor\n",
    "    actual_output_tensor_array = np.asarray(output_list, dtype=np.dtype('float32')).reshape(output_arrays[index].shape)\n",
    "    print('Actual output shape of test data set {}: '.format(index), actual_output_tensor_array.shape)\n",
    "    print('Actual output values of test data set {}: '.format(index), actual_output_tensor_array)\n",
    "    print('Actual final recognized digit of test data set {}: '.format(index), postprocess(actual_output_tensor_array))\n",
    "    \n",
    "    # Both results are expected be equal to each other.\n",
    "    np.testing.assert_almost_equal(actual_output_tensor_array, output_arrays[index], 1)\n",
    "    np.testing.assert_equal(postprocess(actual_output_tensor_array), output_digits[index])\n",
    "\n",
    "\n",
    "# Parse the json response from the `recrods` request.\n",
    "response_json = prediction_json_response_records.json()\n",
    "print('The json response from the `records` request:')\n",
    "pprint(response_json)\n",
    "print('\\n----------------------------------------------------------------------------\\n')\n",
    "\n",
    "# Print and compare the result for the test data set 0\n",
    "print_json_output_and_compare_result(0, response_json['result'][0]['Plus214_Output_0'])\n",
    "print('\\n----------------------------------------------------------------------------\\n')\n",
    "\n",
    "# Parse the json response from the `split` requesgt.\n",
    "response_json = prediction_json_response_split.json()\n",
    "print('The json response from the `split` request:')\n",
    "pprint(response_json)\n",
    "print('\\n----------------------------------------------------------------------------\\n')\n",
    "\n",
    "# Print and compare the result for the test data set 1\n",
    "print_json_output_and_compare_result(1, response_json['result']['data'][0])\n",
    "\n",
    "print('\\n----------------------------------------------------------------------------\\n')\n",
    "\n",
    "# Print and compare the result for the test data set 2\n",
    "print_json_output_and_compare_result(2, response_json['result']['data'][1])\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
